Dealing with Incomplete Data in Clustering

نویسندگان

  • Sunita Soni
  • Bala Buksh
چکیده

Over the years, significant developments have taken place in the direction of clustering numeric, categorical or mixed data. A new challenge is to cluster data with missing attribute values. The early algorithms used Fuzzy c-means to partition data into fuzzy clusters and estimate the missing values through estimation algorithms. Recently, Hathaway and Bezdek have proposed four strategies for effective clustering of incomplete data: Whole Data Strategy (WDS), Partial Distance Strategy (PDS), Optimal Completion Strategy(OCS) and Nearest Prototype Strategy(NPS). This paper provides a brief survey discussing each of the four approaches and the recent developments in the direction of opting them for treating incomplete data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data

The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...

متن کامل

به کارگیری روش‌های خوشه‌بندی در ریزآرایه DNA

Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...

متن کامل

Application of modified balanced iterative reducing and clustering using hierarchies algorithm in parceling of brain performance using fMRI data

Introduction: Clustering of human brain is a very useful tool for diagnosis, treatment, and tracking of brain tumors. There are several methods in this category in order to do this. In this study, modified balanced iterative reducing and clustering using hierarchies (m-BIRCH) was introduced for brain activation clustering. This algorithm has an appropriate speed and good scalability in dealing ...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

استفاده از نگرش تحلیل مؤلفه‌های اصلی برای وزن‌دهی ویژگی‌های آماری، اقلیمی و جغرافیایی حداکثر بارندگی 24 ساعته و تحلیل مکانی خوشه‌بندی (مطالعه موردی: حوضه دریاچه ارومیه)

Regionalization is one of the useful tools for carrying out effective analyses in regions lacking data or with having only incomplete data. One of the regionalization methods widely used in the hydrological studies is the clustering approach. Moreover, another effective factor on clustering is the degree of importance and participation level for each of these attributes. In this study, it was t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016